*Increase in Reactor Regenerator Pressure Difference*¶

Introduction¶

The FCC (Fluidized Catalytic Cracking) condenser is a critical component in refinery operations. It process sweet Vacuum Gas Oil as a feed and cracks heavy oil into lighter oils like LPG and Gasoline.

Fluctuation in Reactor-Regenerator DP can lead to revarsal in catalyst circutation and hence for safety measures may leads to activation in ESD. This project applies Machine Learning (ML) techniques to detect anomalies in FCC condenser efficiency.

image.png

Data Preprocessing¶

The dataset contains operational parameters, including temperature, pressure, flow rates, and velocities. The following preprocessing steps were performed:

  1. Handling Missing Values: Missing values were filled using mean imputation.
  2. Feature Scaling: Standardized using StandardScaler.

Exploratory Data Analysis (EDA)¶

EDA was conducted to visualize trends and detect potential anomalies. Key findings:

  1. Line plot revealed variations in parameters like Temperature, Pressure, Flowrate, etc.
  2. PCA highlighted underlying structure in the dataset.

Anomaly Detection Techniques¶

1. Principal Component Analysis (PCA)

PCA reduced high-dimensional data to two principal components.
Anomalies were detected by analyzing data points far from the normal cluster.

2. Autoencoder (Deep Learning)

A neural network trained to reconstruct normal data patterns.
Reconstruction error was used to flag high-error data points as anomalies.

Data Cleaning, Feature Engineering, Predictive Modeling¶

Importing libraries and datasets

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
plt.rcParams['figure.figsize'] = [12, 6]
plt.rcParams.update({'font.size': 12})
In [2]:
df = pd.read_excel('columns.xlsx')
col = df['Symbol'].values
col = col.tolist()
In [3]:
df_stableFeedFlow = pd.read_csv(r'https://raw.githubusercontent.com/AshuPraja13/FCC-abnormality-detection/main/NOC_stableFeedFlow_outputs.csv',header=None)
df_stableFeedFlow.set_index = df_stableFeedFlow.iloc[:,0]
df_stableFeedFlow = df_stableFeedFlow.drop(columns=0)
df_stableFeedFlow.columns= col
df_stableFeedFlow.sample(5)
Out[3]:
F3 Tatm T1 P4 deltaP P6 Fair T3 T2 Tr ... FLCO FSlurry FReflux Tfra T10 T20 V9 V8 V10 V11
1801 165.06 78.821 460.53 34.4 -6.4 28 2.6793 1566.7 615.99 968.96 ... 1644.9 218.89 3006.8 314.30 509.93 628.46 46.668 49.932 50.217 47.279
634 164.79 79.947 461.09 34.4 -6.4 28 2.6799 1561.6 616.01 969.05 ... 1645.1 213.78 2993.1 314.71 509.75 628.01 46.003 49.634 49.512 47.311
2183 165.06 80.040 460.64 34.4 -6.4 28 2.6831 1566.0 616.00 969.03 ... 1647.3 209.26 3001.1 314.86 509.84 628.04 45.829 49.743 49.279 47.429
2610 165.20 78.028 460.16 34.4 -6.4 28 2.6835 1569.8 615.99 969.01 ... 1644.6 208.76 2986.8 313.36 509.92 628.18 47.757 49.760 48.978 47.269
1803 165.03 78.832 460.24 34.4 -6.4 28 2.6795 1568.2 615.99 968.97 ... 1644.1 218.69 3005.7 314.29 509.92 628.44 46.681 49.915 50.222 47.233

5 rows × 46 columns

In [4]:
df_varyingFeedFlow=pd.read_csv(r'https://raw.githubusercontent.com/AshuPraja13/FCC-abnormality-detection/main/NOC_varyingFeedFlow_outputs.csv',header=None)
df_varyingFeedFlow.set_index = df_varyingFeedFlow.iloc[:,0]
df_varyingFeedFlow = df_varyingFeedFlow.drop(columns=0)
df_varyingFeedFlow.columns= col
df_varyingFeedFlow.sample(5)
Out[4]:
F3 Tatm T1 P4 deltaP P6 Fair T3 T2 Tr ... FLCO FSlurry FReflux Tfra T10 T20 V9 V8 V10 V11
1193 168.40 77.862 460.78 34.4 -6.4 28 2.7362 1585.9 616.00 969.00 ... 1699.1 205.70 3517.3 322.11 513.91 633.47 39.600 56.977 52.924 50.034
5673 162.66 76.135 461.16 34.4 -6.4 28 2.6426 1548.0 616.00 969.01 ... 1606.8 221.66 2614.5 305.18 506.66 623.40 56.834 44.856 47.081 45.430
7041 165.80 76.944 460.74 34.4 -6.4 28 2.6944 1570.0 616.01 969.02 ... 1652.2 212.57 3081.2 314.41 510.76 629.54 46.994 51.151 49.849 47.619
5056 162.08 80.012 460.68 34.4 -6.4 28 2.6346 1547.3 615.99 969.03 ... 1591.9 230.72 2575.0 307.03 505.85 622.36 53.811 43.865 47.206 44.670
4831 162.04 79.589 460.43 34.4 -6.4 28 2.6340 1548.6 616.01 969.03 ... 1590.8 230.83 2564.0 306.53 505.78 622.23 54.441 43.762 47.107 44.615

5 rows × 46 columns

In [5]:
df_deltaP_increase = pd.read_csv(r'https://raw.githubusercontent.com/AshuPraja13/FCC-abnormality-detection/main/deltaP_increase_outputs.csv',header=None)
df_deltaP_increase.set_index = df_deltaP_increase.iloc[:,0]
df_deltaP_increase = df_deltaP_increase.drop(columns=0)
df_deltaP_increase.columns= col
df_deltaP_increase.sample(5)
Out[5]:
F3 Tatm T1 P4 deltaP P6 Fair T3 T2 Tr ... FLCO FSlurry FReflux Tfra T10 T20 V9 V8 V10 V11
483 165.15 79.501 460.62 34.8 -6.8 28 2.6814 1566.7 616.00 968.97 ... 1649.6 211.56 3005.2 314.62 509.89 628.20 46.184 49.852 49.831 47.558
33 165.03 75.438 461.00 34.4 -6.4 28 2.6807 1563.6 616.00 969.00 ... 1640.7 213.97 2945.9 310.97 509.80 628.09 50.679 49.492 48.761 47.057
724 165.21 80.059 460.27 34.8 -6.8 28 2.6856 1569.2 616.00 969.04 ... 1647.8 206.93 3018.9 315.16 510.01 628.25 45.523 49.988 49.071 47.443
1153 164.97 78.208 460.59 34.8 -6.8 28 2.6783 1565.8 616.00 968.97 ... 1639.3 216.10 2968.5 313.23 509.71 628.03 47.883 49.475 49.367 46.980
1245 165.05 77.363 460.22 34.8 -6.8 28 2.6835 1568.6 616.01 969.04 ... 1647.2 211.48 2996.3 313.13 510.04 628.42 48.124 49.958 49.234 47.405

5 rows × 46 columns

EDA

In [6]:
df_deltaP_increase.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1440 entries, 0 to 1439
Data columns (total 46 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   F3           1440 non-null   float64
 1   Tatm         1440 non-null   float64
 2   T1           1440 non-null   float64
 3   P4           1440 non-null   float64
 4   deltaP       1440 non-null   float64
 5   P6           1440 non-null   int64  
 6   Fair         1440 non-null   float64
 7   T3           1440 non-null   float64
 8   T2           1440 non-null   float64
 9   Tr           1440 non-null   float64
 10  Treg         1440 non-null   float64
 11  Lsp          1440 non-null   float64
 12  Tcyc         1440 non-null   float64
 13  Tcyc - Treg  1440 non-null   float64
 14  Cco,sg       1440 non-null   int64  
 15  Co2,sg       1440 non-null   float64
 16  P5           1440 non-null   float64
 17  V4           1440 non-null   float64
 18  V6           1440 non-null   float64
 19  V7           1440 non-null   float64
 20  V3           1440 non-null   float64
 21  V1           1440 non-null   float64
 22  V2           1440 non-null   float64
 23  Frgc         1440 non-null   int64  
 24  Fsc          1440 non-null   int64  
 25  ACAB         1440 non-null   float64
 26  AWGC         1440 non-null   float64
 27  F5           1440 non-null   float64
 28  F7           1440 non-null   float64
 29  Fsg          1440 non-null   float64
 30  FV11         1440 non-null   int64  
 31  P1           1440 non-null   float64
 32  P2           1440 non-null   float64
 33  FLPG         1440 non-null   float64
 34  FLN          1440 non-null   float64
 35  FHN          1440 non-null   float64
 36  FLCO         1440 non-null   float64
 37  FSlurry      1440 non-null   float64
 38  FReflux      1440 non-null   float64
 39  Tfra         1440 non-null   float64
 40  T10          1440 non-null   float64
 41  T20          1440 non-null   float64
 42  V9           1440 non-null   float64
 43  V8           1440 non-null   float64
 44  V10          1440 non-null   float64
 45  V11          1440 non-null   float64
dtypes: float64(41), int64(5)
memory usage: 517.6 KB
In [7]:
df_deltaP_increase.describe().T
Out[7]:
count mean std min 25% 50% 75% max
F3 1440.0 165.012694 1.582447e-01 164.53000 164.890000 165.020000 165.140000 165.480000
Tatm 1440.0 78.342193 1.491834e+00 74.98100 77.191500 78.746000 79.687250 80.070000
T1 1440.0 460.899750 3.939347e-01 459.58000 460.650000 460.910000 461.140000 462.050000
P4 1440.0 34.732485 1.372139e-01 34.40000 34.800000 34.800000 34.800000 34.800000
deltaP 1440.0 -6.732483 1.372164e-01 -6.80000 -6.800000 -6.800000 -6.800000 -6.400000
P6 1440.0 28.000000 0.000000e+00 28.00000 28.000000 28.000000 28.000000 28.000000
Fair 1440.0 2.680606 2.895453e-03 2.67220 2.678575 2.680800 2.682800 2.687600
T3 1440.0 1564.149444 2.865294e+00 1555.80000 1562.200000 1564.100000 1566.000000 1573.500000
T2 1440.0 616.000063 7.019004e-03 615.98000 616.000000 616.000000 616.000000 616.020000
Tr 1440.0 968.996868 3.865328e-02 968.82000 968.980000 969.000000 969.020000 969.250000
Treg 1440.0 1250.000000 4.875090e-02 1249.90000 1250.000000 1250.000000 1250.000000 1250.100000
Lsp 1440.0 29.807753 2.940493e-01 29.46500 29.651000 29.733000 29.829250 31.453000
Tcyc 1440.0 1255.280000 5.018382e-02 1255.10000 1255.300000 1255.300000 1255.300000 1255.400000
Tcyc - Treg 1440.0 5.280165 3.503803e-02 5.14660 5.256475 5.282450 5.303500 5.354700
Cco,sg 1440.0 29883.704167 5.020733e+01 29756.00000 29852.000000 29883.000000 29913.250000 30112.000000
Co2,sg 1440.0 0.012481 1.555107e-04 0.01191 0.012373 0.012489 0.012586 0.012819
P5 1440.0 24.900000 6.752501e-13 24.90000 24.900000 24.900000 24.900000 24.900000
V4 1440.0 46.261990 3.630540e-01 45.37900 46.008750 46.372500 46.558000 46.840000
V6 1440.0 24.800935 8.767327e-02 24.53000 24.746000 24.809000 24.862250 25.002000
V7 1440.0 54.592518 5.902779e-02 54.42100 54.551000 54.595500 54.637000 54.735000
V3 1440.0 46.458697 2.311408e-01 46.29400 46.330000 46.346000 46.402500 46.995000
V1 1440.0 57.939251 2.065041e-01 57.34300 57.801000 57.934000 58.073000 58.619000
V2 1440.0 45.866167 3.305920e-01 45.03700 45.750000 46.026000 46.077000 46.211000
Frgc 1440.0 49584.829167 5.778035e+01 49414.00000 49544.750000 49589.000000 49628.250000 49742.000000
Fsc 1440.0 49584.416667 5.891687e+01 49396.00000 49544.000000 49588.000000 49627.250000 49831.000000
ACAB 1440.0 280.753229 1.339416e+00 277.20000 279.750000 281.145000 281.830000 282.770000
AWGC 1440.0 204.616125 2.498677e+00 198.60000 202.837500 205.390000 206.670000 208.490000
F5 1440.0 1990.668125 7.094397e+00 1970.20000 1985.900000 1990.500000 1995.300000 2014.000000
F7 1440.0 3737.047153 5.024231e+00 3722.70000 3733.600000 3737.200000 3740.700000 3747.800000
Fsg 1440.0 160.836479 1.739822e-01 160.33000 160.710000 160.850000 160.970000 161.260000
FV11 1440.0 28114.860417 2.735492e+02 27445.00000 27925.000000 28199.000000 28337.000000 28547.000000
P1 1440.0 14.638000 7.640988e-14 14.63800 14.638000 14.638000 14.638000 14.638000
P2 1440.0 35.042184 2.132075e-02 35.00000 35.028000 35.038000 35.056000 35.105000
FLPG 1440.0 3037.619167 4.410832e+01 2931.30000 3006.075000 3051.150000 3073.900000 3106.600000
FLN 1440.0 3918.058750 4.528881e+01 3853.20000 3878.700000 3905.050000 3953.225000 4029.600000
FHN 1440.0 708.819583 3.444213e+00 697.97000 706.727500 708.720000 711.040000 717.330000
FLCO 1440.0 1643.684931 3.785560e+00 1632.90000 1640.800000 1643.700000 1646.500000 1652.200000
FSlurry 1440.0 213.935750 3.735500e+00 205.90000 211.105000 214.095000 216.560000 224.080000
FReflux 1440.0 2984.278819 1.873678e+01 2938.60000 2972.300000 2986.800000 2996.200000 3029.200000
Tfra 1440.0 313.544986 1.203643e+00 310.71000 312.657500 313.910000 314.560000 315.330000
T10 1440.0 509.834722 1.324477e-01 509.50000 509.740000 509.835000 509.930000 510.150000
T20 1440.0 628.158278 1.590511e-01 627.84000 628.030000 628.140000 628.280000 628.470000
V9 1440.0 47.514651 1.504642e+00 45.38300 46.216000 47.063500 48.650500 51.247000
V8 1440.0 49.686217 1.915637e-01 49.28700 49.533750 49.693000 49.844000 50.131000
V10 1440.0 49.368616 4.827719e-01 47.87600 49.071000 49.350500 49.679000 50.565000
V11 1440.0 47.222166 2.078380e-01 46.63000 47.067000 47.227500 47.379000 47.687000
In [8]:
sns.heatmap(df_deltaP_increase.corr(),cmap='coolwarm')
Out[8]:
<AxesSubplot:>
In [9]:
for n,i in enumerate(df_stableFeedFlow.columns):
    plt.figure(figsize=(12,2))
    plt.plot(df_stableFeedFlow[i])
    plt.xlabel('time (mins)')
    plt.ylabel(i)
    plt.title(df[df['Symbol']==i]['Description'].values)
    plt.show()
In [10]:
for n,i in enumerate(df_varyingFeedFlow.columns):
    plt.figure(figsize=(12,2))
    plt.plot(df_varyingFeedFlow[i])
    plt.xlabel('time (mins)')
    plt.ylabel(i)
    plt.title(df[df['Symbol']==i]['Description'].values)
    plt.show()
In [11]:
for n,i in enumerate(df_deltaP_increase.columns):
    plt.figure(figsize=(12,2))
    plt.plot(df_deltaP_increase[i])
    plt.xlabel('time (mins)')
    plt.ylabel(i)
    plt.title(df[df['Symbol']==i]['Description'].values)
    plt.show()

Scaling the data with mean=0 & std = 1 using Standard Scalar.

In [12]:
from sklearn.preprocessing import StandardScaler
ss = StandardScaler()
In [13]:
X = ss.fit_transform(df_stableFeedFlow)

Applying PCA

In [14]:
from sklearn.decomposition import PCA
pca = PCA()
In [15]:
X_pca = pca.fit_transform(X)
In [16]:
plt.figure(figsize=(15,6))
sns.set_style('whitegrid')
sns.lineplot(x=list(range(1,47)), y=np.cumsum(pca.explained_variance_ratio_), drawstyle='steps-pre')
sns.lineplot(x=list(range(0,46)),y=np.cumsum(pca.explained_variance_ratio_))
plt.xlabel('Eigen Values')
plt.ylabel('Ratio of Variance')
plt.title('Variance by each Eigen Value')
plt.show()

It can be clearly seen than 10 dimentions can describe more than 98% data, hence redcing the feature space from 46 to 10.

In [17]:
pca = PCA(n_components=10)
X_pca = pca.fit_transform(X)
In [18]:
plt.figure(figsize=(15,6))
sns.set_style('whitegrid')
sns.lineplot(x=list(range(1,11)), y=np.cumsum(pca.explained_variance_ratio_), drawstyle='steps-pre')
sns.lineplot(x=list(range(0,10)),y=np.cumsum(pca.explained_variance_ratio_))
plt.xlabel('Eigen Values')
plt.ylabel('Ratio of Variance')
plt.title('Variance by each Eigen Value')
plt.show()

Applying Autoencoders

In [19]:
X_train = X.reshape(2880,46,1)

Lets create a Sequential model with Bidirectional LSTM & train the model when plant is in steady state.
To avoid overfitting of model by using 20% dropout.

In [20]:
# del model
In [21]:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(256,return_sequences=True),input_shape=(46,1)))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(128,return_sequences=True)))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.Dense(1))
model.compile(optimizer='adam', loss='mse', metrics=['mae'])
model.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 bidirectional (Bidirection  (None, 46, 512)           528384    
 al)                                                             
                                                                 
 dropout (Dropout)           (None, 46, 512)           0         
                                                                 
 bidirectional_1 (Bidirecti  (None, 46, 256)           656384    
 onal)                                                           
                                                                 
 dropout_1 (Dropout)         (None, 46, 256)           0         
                                                                 
 dense (Dense)               (None, 46, 1)             257       
                                                                 
=================================================================
Total params: 1185025 (4.52 MB)
Trainable params: 1185025 (4.52 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
In [22]:
model.fit(X_train,X_train,epochs=30)
Epoch 1/30
90/90 [==============================] - 29s 249ms/step - loss: 0.1533 - mae: 0.2153
Epoch 2/30
90/90 [==============================] - 22s 248ms/step - loss: 0.0031 - mae: 0.0389
Epoch 3/30
90/90 [==============================] - 22s 247ms/step - loss: 0.0025 - mae: 0.0345
Epoch 4/30
90/90 [==============================] - 29s 318ms/step - loss: 0.0023 - mae: 0.0321
Epoch 5/30
90/90 [==============================] - 37s 415ms/step - loss: 0.0021 - mae: 0.0305
Epoch 6/30
90/90 [==============================] - 37s 411ms/step - loss: 0.0019 - mae: 0.0292
Epoch 7/30
90/90 [==============================] - 37s 414ms/step - loss: 0.0018 - mae: 0.0284
Epoch 8/30
90/90 [==============================] - 37s 414ms/step - loss: 0.0018 - mae: 0.0277
Epoch 9/30
90/90 [==============================] - 37s 413ms/step - loss: 0.0017 - mae: 0.0272
Epoch 10/30
90/90 [==============================] - 37s 413ms/step - loss: 0.0017 - mae: 0.0267
Epoch 11/30
90/90 [==============================] - 37s 413ms/step - loss: 0.0017 - mae: 0.0267
Epoch 12/30
90/90 [==============================] - 37s 415ms/step - loss: 0.0016 - mae: 0.0260
Epoch 13/30
90/90 [==============================] - 38s 419ms/step - loss: 0.0016 - mae: 0.0258
Epoch 14/30
90/90 [==============================] - 37s 416ms/step - loss: 0.0016 - mae: 0.0257
Epoch 15/30
90/90 [==============================] - 37s 416ms/step - loss: 0.0015 - mae: 0.0254
Epoch 16/30
90/90 [==============================] - 37s 416ms/step - loss: 0.0015 - mae: 0.0252
Epoch 17/30
90/90 [==============================] - 38s 418ms/step - loss: 0.0015 - mae: 0.0251
Epoch 18/30
90/90 [==============================] - 37s 417ms/step - loss: 0.0015 - mae: 0.0250
Epoch 19/30
90/90 [==============================] - 38s 420ms/step - loss: 0.0015 - mae: 0.0248
Epoch 20/30
90/90 [==============================] - 38s 426ms/step - loss: 0.0015 - mae: 0.0248
Epoch 21/30
90/90 [==============================] - 37s 412ms/step - loss: 0.0014 - mae: 0.0245
Epoch 22/30
90/90 [==============================] - 37s 416ms/step - loss: 0.0014 - mae: 0.0244
Epoch 23/30
90/90 [==============================] - 38s 421ms/step - loss: 0.0014 - mae: 0.0245
Epoch 24/30
90/90 [==============================] - 37s 415ms/step - loss: 0.0014 - mae: 0.0245
Epoch 25/30
90/90 [==============================] - 39s 439ms/step - loss: 0.0014 - mae: 0.0243
Epoch 26/30
90/90 [==============================] - 40s 448ms/step - loss: 0.0014 - mae: 0.0242
Epoch 27/30
90/90 [==============================] - 37s 413ms/step - loss: 0.0014 - mae: 0.0245
Epoch 28/30
90/90 [==============================] - 37s 410ms/step - loss: 0.0014 - mae: 0.0242
Epoch 29/30
90/90 [==============================] - 37s 411ms/step - loss: 0.0014 - mae: 0.0241
Epoch 30/30
90/90 [==============================] - 37s 414ms/step - loss: 0.0014 - mae: 0.0243
Out[22]:
<keras.src.callbacks.History at 0x17996651e50>

Calculating the Reconstruction error using MAE.
Considering 99 percentile of error as an acceptable range, and it signifies the steady state operation.

In [23]:
error_ae = []
for i in range(X.shape[0]):
    y_pred = model.predict(X[i].reshape(1,46,1),verbose=None)[0,:,0]
    error_ae.append(np.abs(X[i]-y_pred).sum())
AE_CL = np.percentile(error_ae,99)
In [24]:
pd.Series(error_ae).plot()
plt.hlines(AE_CL,0,len(error_ae),colors='red',linestyles='--')
Out[24]:
<matplotlib.collections.LineCollection at 0x1799aefc430>

Calculating the Reconstruction error using Q-test, T22-test & Cosine similarity.
Considering 99 percentile of error as an acceptable range, and it signifies the steady state operation.

In [25]:
X_reconstructed = np.dot(X_pca,pca.components_)
error_pca = X-X_reconstructed
Q_train = np.sum(np.abs(error_pca),axis=1)
Q_CL = np.percentile(Q_train,99)
# Q_train plot with CL
plt.figure()
plt.plot(Q_train, color='black')
plt.plot([1,len(Q_train)],[Q_CL,Q_CL], linestyle='--',color='red', linewidth=2)
plt.xlabel('Sample #')
plt.ylabel('Q metric: training data')
plt.title(f'Q metrix is max: {Q_train.max()} at:{Q_train.argmax()}mins')
plt.show()
In [26]:
lambda_ = np.diag(pca.explained_variance_)
lambda_inv = np.linalg.inv(lambda_)
T_train = np.zeros(X_pca.shape[0])
for i in range(X_pca.shape[0]):
    T_train[i] = np.dot(np.dot(X_pca[i],lambda_inv),X_pca[i].T)
T_CL = np.percentile(T_train,99)
# T2_train plot with CL
plt.figure()
plt.plot(T_train, color='black')
plt.plot([1,len(T_train)],[T_CL,T_CL], linestyle='--',color='red', linewidth=2)
plt.xlabel('Sample #')
plt.ylabel('T$^2$ metric: training data')
plt.title(f'T$^2$ metrix is max: {np.array(T_train).max()} at:{np.array(T_train).argmax()}mins')
plt.show()
In [27]:
cosine = []
ed = []
for i in range(X.shape[0]):
    v1 = X[i]
    v2 = np.dot(X_pca,pca.components_)[i]
    cosine.append(np.dot(v1,v2)/(np.linalg.norm(v1)*np.linalg.norm(v2)))
    ed.append(np.linalg.norm(v1 - v2))
C_CL = np.min(cosine)
E_CL = np.percentile(ed,99)
# pd.Series(ed).plot(color='black')
# plt.plot([1,len(ed)],[E_CL,E_CL], linestyle='--',color='red', linewidth=2)
# plt.show()
pd.Series(cosine).plot(color='black')
plt.plot([1,len(cosine)],[C_CL,C_CL], linestyle='--',color='red', linewidth=2)
plt.xlabel('Sample #')
plt.ylabel('Cosine similarity metric: training data')
plt.title(f'Cosine Similarity')
plt.show()
In [28]:
Q_CL,T_CL,C_CL,E_CL,AE_CL
Out[28]:
(4.123113215519074,
 20.424350352568812,
 0.9427550112367162,
 0.9281694746755802,
 0.4382286902947803)

Let's create a function for test data preprocessing and testing the data with our model.

In [29]:
def Q_test(X,X_pca,pca_components_,Q_CL):
    X_reconstructed = np.dot(X_pca,pca_components_)
    error_pca = X-X_reconstructed
    Q_train = np.sum(np.abs(error_pca),axis=1)
    # Q_train plot with CL
    plt.figure()
    plt.plot(Q_train, color='black')
    plt.plot([1,len(Q_train)],[Q_CL,Q_CL], linestyle='--',color='red', linewidth=2)
    plt.xlabel('Sample #')
    plt.ylabel('Q metric: training data')
    plt.title(f'Q metrix is max: {Q_train.max()} at:{Q_train.argmax()}mins')
    plt.show()
    return error_pca
In [30]:
def T_test(X_pca,explained_variance_,TCL):
    lambda_ = np.diag(pca.explained_variance_)
    lambda_inv = np.linalg.inv(lambda_)
    T_train = np.zeros(X_pca.shape[0])
    for i in range(X_pca.shape[0]):
        T_train[i] = np.dot(np.dot(X_pca[i],lambda_inv),X_pca[i].T)
    # T2_train plot with CL
    plt.figure()
    plt.plot(T_train, color='black')
    plt.plot([1,len(T_train)],[T_CL,T_CL], linestyle='--',color='red', linewidth=2)
    plt.xlabel('Sample #')
    plt.ylabel('T$^2$ metric: training data')
    plt.title(f'T$^2$ metrix is max: {np.array(T_train).max()} at:{np.array(T_train).argmax()}mins')
    plt.show()
In [31]:
def cosine(X,X_transformed,pca_components_,C_CL,E_CL):
    cosine = []
    ed = []
    for i in range(X.shape[0]):
        v1 = X[i]
        v2 = np.dot(X_transformed,pca_components_)[i]
        cosine.append(np.dot(v1,v2)/(np.linalg.norm(v1)*np.linalg.norm(v2)))
        ed.append(np.linalg.norm(v1 - v2))
#     pd.Series(ed).plot(color='black')
#     plt.plot([1,len(ed)],[E_CL,E_CL], linestyle='--',color='red', linewidth=2)
#     plt.xlabel('Sample #')
#     plt.ylabel('Eucledian Distance metric: training data')
#     plt.show()
    pd.Series(cosine).plot(color='black')
    plt.plot([1,len(cosine)],[C_CL,C_CL], linestyle='--',color='red', linewidth=2)
    plt.xlabel('Sample #')
    plt.ylabel('Cosine similarity metric: training data')
    plt.title(f'Cosine Similarity')
    plt.show()
In [32]:
def autoencoder(df_test,CL):
    X_test = ss.transform(df_test)
    error_ae = []
    error_sum = []
    for i in range(X_test.shape[0]):
        y_pred = model.predict(X_test[i].reshape(1,46,1),verbose=None)[0,:,0]
        error_ae.append(np.abs(X_test[i]-y_pred))
        error_sum.append(np.abs(X_test[i]-y_pred).sum())
    error_ae=np.array(error_ae)
    pd.Series(error_sum).plot(color = 'black')
    plt.hlines(CL,0,len(error_ae),colors='red',linestyles='--')
    plt.xlabel('Sample #')
    plt.ylabel('Reconstruction error by Autoencoder')
    return error_ae

Testing the model on Varying feed flow rate.¶

In [33]:
X = ss.transform(df_varyingFeedFlow)
X_test = pca.transform(X)
In [34]:
error_pca = Q_test(X,X_test,pca.components_,Q_CL)
In [35]:
T_test(X_test,pca.explained_variance_,T_CL)
In [36]:
cosine(X,X_test,pca.components_,C_CL,E_CL)
In [37]:
error_ae = autoencoder(df_varyingFeedFlow,AE_CL)

Testing the model on abnormal dataset.¶

In [38]:
X = ss.transform(df_deltaP_increase)
X_test = pca.transform(X)
In [39]:
error_pca = Q_test(X,X_test,pca.components_,Q_CL)
In [40]:
T_test(X_test,pca.explained_variance_,T_CL)
In [41]:
cosine(X,X_test,pca.components_,C_CL,E_CL)
In [42]:
error_ae = autoencoder(df_deltaP_increase,AE_CL)

Inference
During steady state operation the errors are within limit but, suddenly the error starts increasing after 200mins.
So, let’s check which parameters are deviating maximum form steady state.
Considering top 10 variables responsible for plant deviation.

Visualization¶

Q test Error

In [43]:
#%% Q contribution
error = np.abs(error_pca).sum(axis=1)
cum = []
for index,value in enumerate(error):
    if (value>Q_CL) and (len(cum)<15):
        previous_val = value
        cum.append(value)
        if len(cum) == 15:
            sample = index
            break
    else:
        cum=[]
# sample = ((pd.Series(error_pca.sum(axis=1))-pd.Series(error_pca.sum(axis=1)).shift()).abs()).argmax()
print('Time-',sample,'mins')
error_test_sample = error_pca[sample]
Q_contri = np.abs(error_test_sample) # *error_test_sample # vector of contributions
Time- 146 mins
In [44]:
plt.figure(figsize=[15,4])
plt.bar(['variable ' + str((i+1)) for i in range(len(Q_contri))], Q_contri)
plt.xticks(rotation = 80)
plt.ylabel('Q contributions')
plt.show()
In [46]:
plt.figure(figsize=(15,40))
print('Time-',sample,'mins')
for i,n in enumerate(np.argsort(Q_contri)[:-11:-1]):
    plt.subplot(5,2,i+1)
    plt.plot(df_deltaP_increase.iloc[:,n],'blue', linewidth=1)
    plt.xlabel('time (mins)')
    plt.ylabel(df['Symbol'][n])
    plt.title(df['Description'][n])
    plt.show
Time- 146 mins

Autoecoder Error

In [47]:
#%% Autoencoder Error
error = np.abs(error_ae).sum(axis=1)
cum = []
for index,value in enumerate(error):
    if (value>AE_CL) and (len(cum)<15):
        previous_val = value
        cum.append(value)
        if len(cum) == 15:
            sample = index
            break
    else:
        cum=[]
# sample = ((pd.Series(error_ae.sum(axis=1))-pd.Series(error_ae.sum(axis=1)).shift()).abs()).argmax()
print('Time-',sample,'mins')
error_test_sample = error_pca[sample]
Q_contri = np.abs(error_test_sample) # *error_test_sample # vector of contributions
Time- 198 mins
In [48]:
plt.figure(figsize=(15,45))
print('Time-',sample,'mins')
for i,n in enumerate(np.argsort(error_ae[sample])[:-11:-1]):
    plt.subplot(5,2,i+1)
    plt.plot(df_deltaP_increase.iloc[:,n],'blue', linewidth=1)
    plt.xlabel('time (mins)')
    plt.ylabel(df['Symbol'][n])
    plt.title(df['Description'][n])
    plt.show
Time- 198 mins

Conclusion¶

RCSV opening has suddenly decreased, Catalyst level in reactor has suddenly increased & then reduced which indicates that CV is closing to reduce the catalyst flow form Reactor to Regenerator. This is due to increase in Reactor pressure and temperature which increases the flow of spent catalyst through SCSV. This has increased the Pressure difference between Reactor and regenerator.

Future Work¶

  1. Integrate with IoT sensors for real-time anomaly tracking.
  2. Develop a predictive maintenance dashboard using Power BI.